{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 在 CUDA 上部署已量化模型\n",
        "\n",
        "**原作者**: [Wuwei Lin](https://github.com/vinx13)\n",
        "\n",
        "本文是使用 TVM 进行自动量化的入门教程。自动量化是 TVM 中的量化方式之一。TVM 中量化的更多细节可以在 [Quantization Story](https://discuss.tvm.apache.org/t/quantization-story/3920) 找到。在本教程中，将在 ImageNet 上导入 GluonCV 预训练模型到 Relay，接着量化 Relay 模型，然后执行推理。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "加载 TVM 库："
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "import set_env"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [],
      "source": [
        "import mxnet_cuda\n",
        "\n",
        "import tvm\n",
        "from tvm import te\n",
        "from tvm import relay\n",
        "from tvm.contrib.download import download_testdata\n",
        "\n",
        "batch_size = 1\n",
        "model_name = \"resnet18_v1\"\n",
        "target = \"cuda\"\n",
        "dev = tvm.device(target)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 准备数据集\n",
        "\n",
        "演示如何准备用于量化的校准数据集。\n",
        "\n",
        "首先下载 ImageNet 的验证集并对数据集进行预处理。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import mxnet as mx\n",
        "from mxnet import gluon\n",
        "\n",
        "calibration_rec = download_testdata(\n",
        "    \"http://data.mxnet.io.s3-website-us-west-1.amazonaws.com/data/val_256_q90.rec\",\n",
        "    \"val_256_q90.rec\",\n",
        ")\n",
        "\n",
        "\n",
        "def get_val_data(num_workers=4):\n",
        "    mean_rgb = [123.68, 116.779, 103.939]\n",
        "    std_rgb = [58.393, 57.12, 57.375]\n",
        "\n",
        "    def batch_fn(batch):\n",
        "        return batch.data[0].asnumpy(), batch.label[0].asnumpy()\n",
        "\n",
        "    img_size = 299 if model_name == \"inceptionv3\" else 224\n",
        "    val_data = mx.io.ImageRecordIter(\n",
        "        path_imgrec=calibration_rec,\n",
        "        preprocess_threads=num_workers,\n",
        "        shuffle=False,\n",
        "        batch_size=batch_size,\n",
        "        resize=256,\n",
        "        data_shape=(3, img_size, img_size),\n",
        "        mean_r=mean_rgb[0],\n",
        "        mean_g=mean_rgb[1],\n",
        "        mean_b=mean_rgb[2],\n",
        "        std_r=std_rgb[0],\n",
        "        std_g=std_rgb[1],\n",
        "        std_b=std_rgb[2],\n",
        "    )\n",
        "    return val_data, batch_fn"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "校准数据集应该是可迭代对象。在 Python 中，将校准数据集定义为生成器对象。在本教程中，只使用一些样本进行校准。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "calibration_samples = 10\n",
        "\n",
        "\n",
        "def calibrate_dataset():\n",
        "    val_data, batch_fn = get_val_data()\n",
        "    val_data.reset()\n",
        "    for i, batch in enumerate(val_data):\n",
        "        if i * batch_size >= calibration_samples:\n",
        "            break\n",
        "        data, _ = batch_fn(batch)\n",
        "        yield {\"data\": data}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 导入模型\n",
        "\n",
        "使用 Relay MxNet 前端从 Gluon 模型动物园导入模型。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def get_model():\n",
        "    gluon_model = gluon.model_zoo.vision.get_model(model_name, pretrained=True)\n",
        "    img_size = 299 if model_name == \"inceptionv3\" else 224\n",
        "    data_shape = (batch_size, 3, img_size, img_size)\n",
        "    mod, params = relay.frontend.from_mxnet(gluon_model, {\"data\": data_shape})\n",
        "    return mod, params"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 量化模型\n",
        "\n",
        "在量化时，需要找到每一层的每个权重和中间 feature map 张量的 scale。\n",
        "\n",
        "对于权重，scale 是直接根据权重值计算的。支持 ``power2`` 和 ``max`` 两种模式。两种模式都首先在权重张量内找到最大值。在 ``power2`` 模式中，最大值被四舍五入到 2 的幂。如果权重和中间特征映射的比例都是 2 的幂，可以利用 bit shifting 进行乘法。这使得它的计算效率更高。在 ``max`` 模式下，以最大值作为 scale。在不 rounding 的情况下，``max`` 模式在某些情况下可能有更好的精度。当 scale 不是二的幂时，将使用定点（fixed point）乘法。\n",
        "\n",
        "对于中间 feature map，可以通过数据感知（data-aware）量化来找到 scale。数据感知量化将校准数据集作为输入参数。通过最小化量化前后激活分布之间的 KL 散度来计算 scale。或者，也可以使用预定义的 global scale。这节省了校准的时间。但准确性可能会受到影响。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def quantize(mod, params, data_aware):\n",
        "    if data_aware:\n",
        "        with relay.quantize.qconfig(calibrate_mode=\"kl_divergence\", weight_scale=\"max\"):\n",
        "            mod = relay.quantize.quantize(mod, params, dataset=calibrate_dataset())\n",
        "    else:\n",
        "        with relay.quantize.qconfig(calibrate_mode=\"global_scale\", global_scale=8.0):\n",
        "            mod = relay.quantize.quantize(mod, params)\n",
        "    return mod"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 运行推理\n",
        "\n",
        "创建 Relay VM 来构建和执行模型。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "[20:38:55] /work/mxnet/src/io/iter_image_recordio_2.cc:177: ImageRecordIOParser2: /home/pc/.tvm_test_data/val_256_q90.rec, use 4 threads for decoding..\n",
            "WARNING:autotvm:One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.\n",
            "[20:39:29] /work/mxnet/src/io/iter_image_recordio_2.cc:177: ImageRecordIOParser2: /home/pc/.tvm_test_data/val_256_q90.rec, use 4 threads for decoding..\n"
          ]
        }
      ],
      "source": [
        "def run_inference(mod):\n",
        "    model = relay.create_executor(\"vm\", mod, dev, target).evaluate()\n",
        "    val_data, batch_fn = get_val_data()\n",
        "    for i, batch in enumerate(val_data):\n",
        "        data, label = batch_fn(batch)\n",
        "        prediction = model(data)\n",
        "        if i > 10:  # only run inference on a few samples in this tutorial\n",
        "            break\n",
        "\n",
        "\n",
        "def main():\n",
        "    mod, params = get_model()\n",
        "    mod = quantize(mod, params, data_aware=True)\n",
        "    run_inference(mod)\n",
        "\n",
        "\n",
        "if __name__ == \"__main__\":\n",
        "    import os\n",
        "    os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'\n",
        "    os.environ['MXNET_CUDNN_LIB_CHECKING'] = '0'\n",
        "    main()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3.10.4 ('tvmx': conda)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.4"
    },
    "vscode": {
      "interpreter": {
        "hash": "e579259ee6098e2b9319de590d145b4b096774fe457bdf04260e3ba5c171e887"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
